perm filename CH4A[206,JMC] blob
sn#005356 filedate 1972-05-04 generic text, type T, neo UTF8
00100 COMPILING IN LISP
00200
00300
00400 Translating programs from a source language into a target
00500 language is the most common form of computation with symbolic
00600 expressions. Because of its commonness, and because general
00700 computation with symbolic expressions is not much practiced,
00800 compiling is generally treated separately from the rest of symbolic
00900 computation. Nevertheless, it turns out that LISP is a rather good
01000 language for writing compilers and it is especially good for
01100 describing compilers in an understandable way. To that end we shall
01200 describe two compilers, LCOM0 and LCOM4, both of which take
01300 S-expression descriptions of LISP recursive functions and translate
01400 them into machine code for the PDP-10 computer. LCOM0 is about as
01500 simple a LISP compiler as can be written (only one page of
01600 M-expressions), but it compiles rather bulky machine code. LCOM4 is
01700 based on the same ideas as LCOM0 but recognizes many special cases
01800 and produces much better code.
01900
02000 Compilers perform several operations in sequence though some
02100 of them are sometimes combined. These operations are
02200
02300 1. Parsing. This consists of reading the input string of
02400 symbols and making a structure in memory that represents the source
02500 program. Sometimes the program is parsed one statement at a time,
02600 and further operations are done before the next statement is parsed.
02700 The memory structure sometimes consists of tables, sometimes Polish
02800 prefix or postfix notation and sometimes is list structure. In our
02900 case, parsing is trivial, because the LISP read program interprets
03000 the parentheses, spaces and atomic symbols of the source program and
03100 creates a list structure which can be used by the compiler without
03200 further parsing. A compiler for LISP M-expressions would require
03300 parsing before compilation, but we would prefer to do this by giving
03400 a suitable syntax to a general symbolic expression read program
03500 rather than by writing a special parser.
03600
03700 2. Sometimes the source code is improved before being
03800 translated. We don't do that. All optimization is imbedded in the
03900 translator itself.
04000
04100 3. Next the source code is translated into machine code.
04200 This is sometimes symbolic assembly code, sometimes it is binary
04300 code, and sometimes it is relocatable binary code with fixups in the
04400 case of one pass compilers. In our case, the output is list
04500 structure in the LAP assembly language. Usually, using assembly code
04600 as an intermediate form wastes time because character strings have to
04700 be generated by the compiler and subsequently read a character at a
04800 time by the symbolic assembly program. In our case, this need not
04900 occur, because within memory, the LAP code is a list structure.
05000 Pointers to atoms are generated, by the compiler and read by LAP
05100 without the characters in the print names of the atoms ever being
05200 scanned.
05300
05400 4. Optimization of the object code is often done, but we
05500 don't do this, because most of the useful optimizations can be done
05600 in the code generation phase and the bad object code is never
05700 generated.
05800
05900 5. Finally, the assembly code is translated into binary by
06000 LAP. In LISP, the translation is done directly into the space in
06100 memory where the code will eventually run.
06200
06300 In our opinion, some of the techniques used here are better
06400 than many of the usual techniques even for compiling languages like
06500 Algol. In particular, we can recommend representing the source
06600 program as a list structure and doing the optimization along with the
06700 compiling.
06800
06900
07000 4.2. An example of compilation.
07100
07200 We start by showing the LAP code into which LCOM4 translates
07300 our old friend %alt. Remember that
07400
07500 alt x ← if null x ∨ null cdr x then x else a x . alt dd x.
07600
07700 The corresponding LAP code produced by LCOM4 is
07450
07500 (LAP ALT SUBR)
07600 (PUSH P 1)
07700 (MOVE 1 0 P)
07800 (JUMPE 1 G0186)
07900 (HRRZ@ 1 0 P)
08000 (JUMPN 1 G0185)
08100 G0186
08200 (MOVE 1 0 P)
08300 (JRST 0 G0184)
08400 G0185
08500 (HRRZ@ 1 0 P)
08600 (HRRZ@ 1 1)
08700 (CALL 1 (E ALT))
08800 (MOVE 2 1)
08900 (HLRZ@ 1 0 P)
09000 (CALL 2 (E CONS))
09100 G0184
09200 (SUB P (C 0 0 1 1))
09300 (POPJ P)
09400 NIL
09500
09600 We have the following remarks about this object program.
09700
09800 1. (LAP ALT SUBR) tells LISP that this is a LAP program
09900 called ALT and that it is of type SUBR. LAP will assemble it into
10000 binary program space and put a pointer to it on the property list of
10100 the atom ALT. The NIL at the end is just punctuation.
10200
10300 2. The arguments of the function (and there can't be too many
10400 of them) appear in successive accumulators of the PDP-10 starting
10500 with accumulator 1. The PDP-10 computer has 16 accumulators, but
10600 only the 1 thru 5 are used for this purpose.
10700
10800 3. Within the e
10900